Abstract: The core research in Content-Based Video Retrieval (CBVR) is to automatically parse video and text to identify meaningful composition structure. To facilitate fast and accurate content access to video data, a video document should be segmented into shots and scenes. Recognizing objects sequence from videos is an important Problem of computer vision applications such as web searching, target recognition, surveillance, crime detection etc. To build an Efficient video retrieval system that focuses on features such as color, texture, shape, motion, Visual text embedded in an image .Multimodal is the capacity of system to communicate with one or more input given in search process, It can text, Image, audio embedded in it with signals based, Text embedded in an image. Thus it provides potentially accurate results. These results are used in video searching, video surveillance, text embedded in image .In CBVR, a Video is segmented for its preprocessing Key Frames are used for Feature Extraction. Then clustering and indexing is done with k-means clustering HCT(Hierarchical Clustering Tree),Then Similarity matching is done.

Keywords: Content-Based Video Retrieval, HCT (Hierarchical Clustering Tree), Keyframes, Multimodal.